ACL - 05 Computational Approaches to Semitic Languages

نویسنده

  • Nizar Habash
چکیده

We explore the application of memorybased learning to morphological analysis and part-of-speech tagging of written Arabic, based on data from the Arabic Treebank. Morphological analysis – the construction of all possible analyses of isolated unvoweled wordforms – is performed as a letter-by-letter operation prediction task, where the operation encodes segmentation, part-of-speech, character changes, and vocalization. Part-of-speech tagging is carried out by a bi-modular tagger that has a subtagger for known words and one for unknown words. We report on the performance of the morphological analyzer and part-of-speech tagger. We observe that the tagger, which has an accuracy of 91.9% on new data, can be used to select the appropriate morphological analysis of words in context at a precision of 64.0 and a recall of 89.7.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Can You Tag the Modal? You Should

Computational linguistics methods are typically first developed and tested in English. When applied to other languages, assumptions from English data are often applied to the target language. One of the most common such assumptions is that a “standard” part-of-speech (POS) tagset can be used across languages with only slight variations. We discuss in this paper a specific issue related to the d...

متن کامل

Integrated Morphological and Syntactic Disambiguation for Modern Hebrew

Current parsing models are not immediately applicable for languages that exhibit strong interaction between morphology and syntax, e.g., Modern Hebrew (MH), Arabic and other Semitic languages. This work represents a first attempt at modeling morphological-syntactic interaction in a generative probabilistic framework to allow for MH parsing. We show that morphological information selected in tan...

متن کامل

Bayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near East.

The evolution of languages provides a unique opportunity to study human population history. The origin of Semitic and the nature of dispersals by Semitic-speaking populations are of great importance to our understanding of the ancient history of the Middle East and Horn of Africa. Semitic populations are associated with the oldest written languages and urban civilizations in the region, which g...

متن کامل

Modifying a Natural Language Processing System for European Languages to Treat Arabic in Information Processing and Information Retrieval Applications

The goal of many natural language processing platforms is to be able to someday correctly treat all languages. Each new language, especially one from a new language family, provokes some modification and design changes. Here we present the changes that we had to introduce into our platform designed for European languages in order to handle a Semitic language. Treatment of Arabic was successfull...

متن کامل

A Treebank of Ugaritic. Annotating Fragmentary Attested Languages

The paper presents an outline of a treebank of Ugaritic, an extinct Semitic language. It describes the basic structure of the treebank, and possibility of re-using approaches applied to other Semitic languages. It also discusses problems of analyzing a language attested in a fragmentary form and possible usage of a treebank based approaches for further reconstruction of text passages.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005